Launch and Iterate: Reducing Prediction Churn
نویسندگان
چکیده
Practical applications of machine learning often involve successive training iterations with changes to features and training examples. Ideally, changes in the output of any new model should only be improvements (wins) over the previous iteration, but in practice the predictions may change neutrally for many examples, resulting in extra net-zero wins and losses, referred to as unnecessary churn. These changes in the predictions are problematic for usability for some applications, and make it harder and more expensive to measure if a change is statistically significant positive. In this paper, we formulate the problem and present a stabilization operator to regularize a classifier towards a previous classifier. We use a Markov chain Monte Carlo stabilization operator to produce a model with more consistent predictions without adversely affecting accuracy. We investigate the properties of the proposal with theoretical analysis. Experiments on benchmark datasets for different classification algorithms demonstrate the method and the resulting reduction in churn. 1 The Curse of Version 2.0 In most practical settings, training and launching an initial machine-learned model is only the first step: as new and improved features are created, additional training data is gathered, and the model and learning algorithm are improved, it is natural to launch a series of ever-improving models. Each new candidate may bring wins, but also unnecessary changes. In practice, it is desirable to minimize any unnecessary changes for two key reasons. First, unnecessary changes can hinder usability and debugability as they can be disorienting to users and follow-on system components. Second, unnecessary changes make it more difficult to measure with statistical confidence whether the change is truly an improvement. For both these reasons, there is great interest in making only those changes that are wins, and minimizing any unnecessary changes, while making sure such process does not hinder the overall accuracy objective. There is already a large body of work in machine learning that treats the stability of learning algorithms. These range from the early works of Devroye and Wagner [1] and Vapnik [2, 3] to more recent studies of learning stability in more general hypothesis spaces [4, 5, 6]. Most of the literature on this topic focus on stability of the learning algorithm in terms of the risk or loss function and how such properties translate into uniform generalization with specific convergence rates. We build on these notions, but the problem treated here is substantively different. We address the problem of training consecutive classifiers to reduce unnecessary changes in the presence of realistic evolution of the problem domain and the training sets over time. The main contributions of this paper include: (I) discussion and formulation of the “churn” metric between trained models, (II) design of stabilization operators for regularization towards a previous model, (III) proposing a Markov chain Monte Carlo (MCMC) stabilization technique, (VI) theoretical analysis of the proposed stabilization in terms of churn, and (V) empirical analysis of the proposed methods on benchmark datasets with different classification algorithms. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. Table 1: Win-loss ratio (WLR) needed to establish a change is statistically significant at the p = 0.05 level for k wins out of n diffs from a binomial distribution. The empirical WLR column shows the WLR one must actually see in the diffs. The true WLR column is the WLR the change must have so that any random draw of diffs has at least a 95% chance of producing the needed empirical WLR. # Diffs Min # Wins Max # Losses Empirical WLR True WLR Needed Allowed Needed Needed 10 9 1 9.000 26.195 100 59 41 1.439 1.972 1,000 527 473 1.114 1.234 10,000 5,083 4,917 1.034 1.068 1.1 Testing for Improvements In the machine learning literature, it is common to compare classifiers on a fixed pre-labeled test set. However, a fixed test set has a few practical downsides. First, if many potential changes to the model are evaluated on the same dataset, it becomes difficult to avoid observing spurious positive effects that are actually due to chance. Second, the true test distribution may be evolving over time, meaning that a fixed test set will eventually diverge from the true distribution of interest. Third, and most important to our discussion, any particular change may affect only a small subset of the test examples, leaving too small a sample of differences (diffs) to determine whether a change is statistically significant. For example, suppose one has a fixed test set of 10,000 samples with which to evaluate a classifier. Consider a change to one of the features, say a Boolean string-similarity feature that causes the feature to match more synonyms, and suppose that re-training a classifier with this small change to this one feature impacts only 0.1% of random examples. Then only 10 of the 10,000 test examples would be affected. As shown in the first row of Table 1, given only 10 diffs, there must be 9 or more wins to declare the change statistically significantly positive for p = 0.05. Note that cross-validation (CV), even in leave-one-out form, does not solve this issue. First, we are still bound by the size of the training set which might not include enough diffs between the two models. Second, and more importantly, the model in the previous iteration has likely seen the entire dataset, which breaks the independence assumption needed for the statistical test. To address these problems and ensure a fresh, sufficiently large test set for each comparison, practitioners often instead measure changes on a set of diffs for the proposed change. For example, to compare classifier A and B, each classifier is evaluated on a billion unlabeled examples, and then the set of diffs is defined as those examples for which classifiers A and B predict a different class.
منابع مشابه
Hierarchical Alpha-cut Fuzzy C-means, Fuzzy ARTMAP and Cox Regression Model for Customer Churn Prediction
As customers are the main asset of any organization, customer churn management is becoming a major task for organizations to retain their valuable customers. In the previous studies, the applicability and efficiency of hierarchical data mining techniques for churn prediction by combining two or more techniques have been proved to provide better performances than many single techniques over a nu...
متن کاملTelecommunication Customer Detainment Management
This chapter proposes an integrated methodological system of telecommunication customer detainment management, including telecommunication customer churn prediction and strategy formulation of customer detainment management. The formulation of churn customer detainment management strategy includes customer detainment value assessment, customer detainment level determination, enterprise-attribut...
متن کاملChurn Prediction in Mobile Telecom System Using Data Mining Techniques
At present situation, telecommunication department plays vital role in our day today human life. At the same time telecommunication area attains a rapid growth in very short period of years. Due to this fact, Telecom department is one of the senses deciding factor in world market. Because of this reason, this field turned out to be a most profitable area for investment. At the mean time competi...
متن کاملPredicting Influential Mobile - Subscriber Churners using Low - level User Features
In the last years, customer churn prediction has been very high on the agenda of telecommunications service providers. Among customers predicted as churners, highly influential customers deserve special attention, since their churns can also trigger churns of their peers. The aim of this study is to find good predictors of churn influence in a mobile service network. To this end, a procedure fo...
متن کاملNeighborhood Cleaning Rules and Particle Swarm Optimization for Predicting Customer Churn Behavior in Telecom Industry
Churn prediction is an important task for Customer Relationship Management (CRM) in telecommunication companies. Accurate churn prediction helps CRM in planning effective strategies to retain their valuable customers. However, churn prediction is a complex and challenging task. In this paper, a hybrid churn prediction model is proposed based on combining two approaches; Neighborhood Cleaning Ru...
متن کاملDetermination of Algorithms Making Balance Between Accuracy and Comprehensibility in Churn Prediction Setting
Predictive modeling is a useful tool for identifying customers who are at risk of churn. An appropriate churn prediction model should be both accurate and comprehensible. However, reviewing the past researches in this context shows that much attention is paid to accuracy of churn prediction models than comprehensibility of them. This paper compares three different rule induction techniques from...
متن کامل